TNO at TDT2001: Language Model-Based Topic Detection

نویسندگان

  • Martijn Spitters
  • Wessel Kraaij
چکیده

Topic detection is concerned with the unsupervised clustering of news stories over time. The TNO topic detection system is based on a language modeling approach. For the grouping of stories we combined a simple single pass method to establish an initial clustering and a reallocation method to stabilize the clusters within a certain allowed deferral period. The similarity of an incoming story to an existing cluster is defined as the average of the similarities of to each story . These individual similarities are computed by taking the sum of the generative probabilities and where and are modeled as unigram language models. Because these story language models are based on extremely sparse statistics, the word probabilities are smoothed using a background model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Limsi Topic Tracking System for Tdt2001

In this paper we describe the LIMSI topic tracking system used for the DARPA 2001 Topic Detection and Tracking evaluation (TDT2001). The system relies on a unigram topic model, where the score for an incoming document is the normalized likelihood ratio of the topic model and a general English model. In order to compensate for the very small amount of training data for each topic, document expan...

متن کامل

Using language models for tracking events of interest over time

This paper presents the TNO tracking system which was evaluated at the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. We built a baseline tracking system based on a language modeling approach. This approach had proved to be powerful for the TREC adaptive filtering task and several other IR tasks.

متن کامل

Unsupervised Event Clustering in Multilingual News Streams

Abstract The Topic Detection and Tracking (TDT) benchmark evaluation project embraces a variety of technical challenges for information retrieval research. The TDT topic detection task is concerned with the unsupervised grouping of news stories according to the events they discuss. A detection system must both discover new events as the incoming stories are processed and associate incoming stor...

متن کامل

Description of Ntu Approach to Link Detection Task in Tdt2001

We participated in the link detection task and submitted four runs, including both manual and ASR transcription for audio resources; and both English translation and original Chinese character source stream for Mandarin sources. This paper will propose a method to tell if a pair of news stories discusses the same topic. Several issues are addressed, e.g., how to represent a news story, how to m...

متن کامل

A Language Modeling Approach to Tracking News Events

This paper presents the TNO tracking system for the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. Being a first year participant to the TDT project, our original goal for this year was to build a baseline tracking system based on a language modeling approach. This approach had proved to be powerfu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001